-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint connector bugfixes #10647
Conversation
6069101
to
a4ad2d7
Compare
[🤖]: Hi @jstjohn 👋, I just wanted to let you know that, you know, a CICD pipeline for this PR just finished successfully ✨ So it might be time to merge this PR or like to get some approvals 🚀 But I'm just a 🤖 so I'll leave it you what to do next. Have a great day! //cc @ko3n1g |
a4ad2d7
to
04a9660
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot.
CI seems to have an issue with CI assets; will retry it later. |
acd3b68
to
ffa03ab
Compare
32365ea
to
499db1c
Compare
…_utils instead (to avoid side-effects to nemo1.0 ckpts) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
499db1c
to
41b9a65
Compare
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
[🤖]: Hi @jstjohn 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
* Update checkpoint connector nemo_save to match current folder heirarchy Signed-off-by: John St John <jstjohn@nvidia.com> * Address PR feedback Signed-off-by: John St John <jstjohn@nvidia.com> * Address divergent code issue in ckpt_to_dir Signed-off-by: John St John <jstjohn@nvidia.com> * revert changes to nemo.utils.model_utils and use model.lightning.ckpt_utils instead (to avoid side-effects to nemo1.0 ckpts) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: John St John <jstjohn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
This needs #10786 |
* Update checkpoint connector nemo_save to match current folder heirarchy Signed-off-by: John St John <jstjohn@nvidia.com> * Address PR feedback Signed-off-by: John St John <jstjohn@nvidia.com> * Address divergent code issue in ckpt_to_dir Signed-off-by: John St John <jstjohn@nvidia.com> * revert changes to nemo.utils.model_utils and use model.lightning.ckpt_utils instead (to avoid side-effects to nemo1.0 ckpts) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: John St John <jstjohn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Update checkpoint connector nemo_save to match current folder heirarchy Signed-off-by: John St John <jstjohn@nvidia.com> * Address PR feedback Signed-off-by: John St John <jstjohn@nvidia.com> * Address divergent code issue in ckpt_to_dir Signed-off-by: John St John <jstjohn@nvidia.com> * revert changes to nemo.utils.model_utils and use model.lightning.ckpt_utils instead (to avoid side-effects to nemo1.0 ckpts) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: John St John <jstjohn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Update checkpoint connector nemo_save to match current folder heirarchy Signed-off-by: John St John <jstjohn@nvidia.com> * Address PR feedback Signed-off-by: John St John <jstjohn@nvidia.com> * Address divergent code issue in ckpt_to_dir Signed-off-by: John St John <jstjohn@nvidia.com> * revert changes to nemo.utils.model_utils and use model.lightning.ckpt_utils instead (to avoid side-effects to nemo1.0 ckpts) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: John St John <jstjohn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
…0.0` (#10770) * Checkpoint connector bugfixes (#10647) * Update checkpoint connector nemo_save to match current folder heirarchy Signed-off-by: John St John <jstjohn@nvidia.com> * Address PR feedback Signed-off-by: John St John <jstjohn@nvidia.com> * Address divergent code issue in ckpt_to_dir Signed-off-by: John St John <jstjohn@nvidia.com> * revert changes to nemo.utils.model_utils and use model.lightning.ckpt_utils instead (to avoid side-effects to nemo1.0 ckpts) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: John St John <jstjohn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: ko3n1g <ko3n1g@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use ckpt_to_weights_subdir in restore (#10786) * use ckpt_to_weights_subdir in restore Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * make ckpt_to_{weight,context}_subdir idempotent Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * py syntax fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: John St John <jstjohn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: ko3n1g <ko3n1g@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: John St. John <jstjohn@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: ko3n1g <ko3n1g@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
* Update checkpoint connector nemo_save to match current folder heirarchy Signed-off-by: John St John <jstjohn@nvidia.com> * Address PR feedback Signed-off-by: John St John <jstjohn@nvidia.com> * Address divergent code issue in ckpt_to_dir Signed-off-by: John St John <jstjohn@nvidia.com> * revert changes to nemo.utils.model_utils and use model.lightning.ckpt_utils instead (to avoid side-effects to nemo1.0 ckpts) Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: John St John <jstjohn@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
What does this PR do ?
Get checkpoint connector working for bionemo (see NVIDIA/bionemo-framework#180)
Changelog
/weights
and/context
subdirectory scheme so checkpoint loaders work properly with checkpoints created by this method.